Prosper Loan Data Analysis by Meichen Zhou

Abstract

This report explores a Prosper loan dataset that contains peer-to-peer loan listing details of consumers with Prosper. We start with exploratory analysis on variables of interest, and then raise some deeper questions on the underlying relationship between variables. Finally, we build a predictive linear model for borrower’s interest rate and justify the output.

Introduction of Data

The prosper data set consists of 113,937 observations and 81 variables. The data is on the level of ListingNumber of each loan. The first step of analysis is to import the data from csv file to R dataframe. Then we get a better understanding of the data by looking at the variable list, variable types, the print-out and the distribution of variables, and the unique count and missing count of selective variables such as borrower rate, term, prosper score, loan amount, credit history etc.

## 'data.frame':    113937 obs. of  81 variables:
##  $ ListingKey                         : Factor w/ 113048 levels "00003546482094282EF90E5",..: 7180 7192 6492 6514 6531 6534 6544 6551 6532 6532 ...
##  $ ListingNumber                      : int  193129 1209647 81716 658116 909464 1074836 750899 768193 1023355 1023355 ...
##  $ ListingCreationDate                : Factor w/ 34801 levels "00:00.0","00:00.1",..: 5431 16274 452 1489 22348 15402 30617 28572 25250 25250 ...
##  $ CreditGrade                        : Factor w/ 9 levels "","A","AA","B",..: 5 1 8 1 1 1 1 1 1 1 ...
##  $ Term                               : int  36 36 36 36 36 60 36 36 36 36 ...
##  $ LoanStatus                         : Factor w/ 12 levels "Cancelled","Chargedoff",..: 3 4 3 4 4 4 4 4 4 4 ...
##  $ ClosedDate                         : Factor w/ 2803 levels "","02:17.6","02:25.0",..: 2384 1 802 1 1 1 1 1 1 1 ...
##  $ BorrowerAPR                        : num  0.165 0.12 0.283 0.125 0.246 ...
##  $ BorrowerRate                       : num  0.158 0.092 0.275 0.0974 0.2085 ...
##  $ LenderYield                        : num  0.138 0.082 0.24 0.0874 0.1985 ...
##  $ EstimatedEffectiveYield            : num  NA 0.0796 NA 0.0849 0.1832 ...
##  $ EstimatedLoss                      : num  NA 0.0249 NA 0.0249 0.0925 ...
##  $ EstimatedReturn                    : num  NA 0.0547 NA 0.06 0.0907 ...
##  $ ProsperRating..numeric.            : int  NA 6 NA 6 3 5 2 4 7 7 ...
##  $ ProsperRating..Alpha.              : Factor w/ 8 levels "","A","AA","B",..: 1 2 1 2 6 4 7 5 3 3 ...
##  $ ProsperScore                       : int  NA 7 NA 9 4 10 2 4 9 11 ...
##  $ ListingCategory..numeric.          : int  0 2 0 16 2 1 1 2 7 7 ...
##  $ BorrowerState                      : Factor w/ 52 levels "","AK","AL","AR",..: 7 7 12 12 25 34 18 6 16 16 ...
##  $ Occupation                         : Factor w/ 68 levels "","Accountant/CPA",..: 37 43 37 52 21 43 50 29 24 24 ...
##  $ EmploymentStatus                   : Factor w/ 9 levels "","Employed",..: 9 2 4 2 2 2 2 2 2 2 ...
##  $ EmploymentStatusDuration           : int  2 44 NA 113 44 82 172 103 269 269 ...
##  $ IsBorrowerHomeowner                : logi  TRUE FALSE FALSE TRUE TRUE TRUE ...
##  $ CurrentlyInGroup                   : logi  TRUE FALSE TRUE FALSE FALSE FALSE ...
##  $ GroupKey                           : Factor w/ 707 levels "","00943382969547936B0C529",..: 1 1 335 1 1 1 1 1 1 1 ...
##  $ DateCreditPulled                   : Factor w/ 100278 levels "00:00.0","00:00.2",..: 63351 43901 2975 15398 94325 29949 58623 70468 31274 31274 ...
##  $ CreditScoreRangeLower              : int  640 680 480 800 680 740 680 700 820 820 ...
##  $ CreditScoreRangeUpper              : int  659 699 499 819 699 759 699 719 839 839 ...
##  $ FirstRecordedCreditLine            : Factor w/ 11586 levels "","1/1/1951 0:00",..: 1072 5084 9271 4506 4275 4773 11336 4423 5797 5797 ...
##  $ CurrentCreditLines                 : int  5 14 NA 5 19 21 10 6 17 17 ...
##  $ OpenCreditLines                    : int  4 14 NA 5 19 17 7 6 16 16 ...
##  $ TotalCreditLinespast7years         : int  12 29 3 29 49 49 20 10 32 32 ...
##  $ OpenRevolvingAccounts              : int  1 13 0 7 6 13 6 5 12 12 ...
##  $ OpenRevolvingMonthlyPayment        : int  24 389 0 115 220 1410 214 101 219 219 ...
##  $ InquiriesLast6Months               : int  3 3 0 0 1 0 0 3 1 1 ...
##  $ TotalInquiries                     : int  3 5 1 1 9 2 0 16 6 6 ...
##  $ CurrentDelinquencies               : int  2 0 1 4 0 0 0 0 0 0 ...
##  $ AmountDelinquent                   : int  472 0 NA 10056 0 0 0 0 0 0 ...
##  $ DelinquenciesLast7Years            : int  4 0 0 14 0 0 0 0 0 0 ...
##  $ PublicRecordsLast10Years           : int  0 1 0 0 0 0 0 1 0 0 ...
##  $ PublicRecordsLast12Months          : int  0 0 NA 0 0 0 0 0 0 0 ...
##  $ RevolvingCreditBalance             : int  0 3989 NA 1444 6193 62999 5812 1260 9906 9906 ...
##  $ BankcardUtilization                : num  0 0.21 NA 0.04 0.81 0.39 0.72 0.13 0.11 0.11 ...
##  $ AvailableBankcardCredit            : int  1500 10266 NA 30754 695 86509 1929 2181 77696 77696 ...
##  $ TotalTrades                        : int  11 29 NA 26 39 47 16 10 29 29 ...
##  $ TradesNeverDelinquent..percentage. : num  0.81 1 NA 0.76 0.95 1 0.68 0.8 1 1 ...
##  $ TradesOpenedLast6Months            : int  0 2 NA 0 2 0 0 0 1 1 ...
##  $ DebtToIncomeRatio                  : num  0.17 0.18 0.06 0.15 0.26 0.36 0.27 0.24 0.25 0.25 ...
##  $ IncomeRange                        : Factor w/ 8 levels "$0 ","$1-24,999",..: 4 5 7 4 3 3 4 4 4 4 ...
##  $ IncomeVerifiable                   : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##  $ StatedMonthlyIncome                : num  3083 6125 2083 2875 9583 ...
##  $ LoanKey                            : Factor w/ 113044 levels "00003683605746079487FF7",..: 100315 69815 46242 70754 71365 86483 91228 5294 881 881 ...
##  $ TotalProsperLoans                  : int  NA NA NA NA 1 NA NA NA NA NA ...
##  $ TotalProsperPaymentsBilled         : int  NA NA NA NA 11 NA NA NA NA NA ...
##  $ OnTimeProsperPayments              : int  NA NA NA NA 11 NA NA NA NA NA ...
##  $ ProsperPaymentsLessThanOneMonthLate: int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ ProsperPaymentsOneMonthPlusLate    : int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ ProsperPrincipalBorrowed           : num  NA NA NA NA 11000 NA NA NA NA NA ...
##  $ ProsperPrincipalOutstanding        : num  NA NA NA NA 9948 ...
##  $ ScorexChangeAtTimeOfListing        : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LoanCurrentDaysDelinquent          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ LoanFirstDefaultedCycleNumber      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LoanMonthsSinceOrigination         : int  78 0 86 16 6 3 11 10 3 3 ...
##  $ LoanNumber                         : int  19141 134815 6466 77296 102670 123257 88353 90051 121268 121268 ...
##  $ LoanOriginalAmount                 : int  9425 10000 3001 10000 15000 15000 3000 10000 10000 10000 ...
##  $ LoanOriginationDate                : Factor w/ 1873 levels "1/10/2006 0:00",..: 1729 880 35 314 1783 541 974 1099 476 476 ...
##  $ LoanOriginationQuarter             : Factor w/ 33 levels "Q1 2006","Q1 2007",..: 18 8 2 32 24 33 16 16 33 33 ...
##  $ MemberKey                          : Factor w/ 90824 levels "00003397697413387CAF966",..: 10981 10212 33733 54932 19440 48049 60441 40985 26069 26069 ...
##  $ MonthlyLoanPayment                 : num  330 319 123 321 564 ...
##  $ LP_CustomerPayments                : num  11396 0 4187 5143 2820 ...
##  $ LP_CustomerPrincipalPayments       : num  9425 0 3001 4091 1563 ...
##  $ LP_InterestandFees                 : num  1971 0 1186 1052 1257 ...
##  $ LP_ServiceFees                     : num  -133.2 0 -24.2 -108 -60.3 ...
##  $ LP_CollectionFees                  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_GrossPrincipalLoss              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_NetPrincipalLoss                : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_NonPrincipalRecoverypayments    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ PercentFunded                      : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Recommendations                    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ InvestmentFromFriendsCount         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ InvestmentFromFriendsAmount        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Investors                          : int  258 1 41 158 20 1 1 1 1 1 ...
##   ListingNumber Term BorrowerRate EstimatedReturn ProsperScore
## 1        193129   36       0.1580              NA           NA
## 2       1209647   36       0.0920         0.05470            7
## 3         81716   36       0.2750              NA           NA
## 4        658116   36       0.0974         0.06000            9
## 5        909464   36       0.2085         0.09066            4
## 6       1074836   60       0.1314         0.07077           10
##   EmploymentStatus IsBorrowerHomeowner CurrentCreditLines TotalInquiries
## 1    Self-employed                TRUE                  5              3
## 2         Employed               FALSE                 14              5
## 3    Not available               FALSE                 NA              1
## 4         Employed                TRUE                  5              1
## 5         Employed                TRUE                 19              9
## 6         Employed                TRUE                 21              2
##   CurrentDelinquencies AmountDelinquent BankcardUtilization
## 1                    2              472                0.00
## 2                    0                0                0.21
## 3                    1               NA                  NA
## 4                    4            10056                0.04
## 5                    0                0                0.81
## 6                    0                0                0.39
##   DebtToIncomeRatio StatedMonthlyIncome LoanOriginalAmount
## 1              0.17            3083.333               9425
## 2              0.18            6125.000              10000
## 3              0.06            2083.333               3001
## 4              0.15            2875.000              10000
## 5              0.26            9583.333              15000
## 6              0.36            8333.333              15000
##   LoanOriginationQuarter Investors CreditScoreRangeLower
## 1                Q3 2007       258                   640
## 2                Q1 2014         1                   680
## 3                Q1 2007        41                   480
## 4                Q4 2012       158                   800
## 5                Q3 2013        20                   680
## 6                Q4 2013         1                   740
##   CreditScoreRangeUpper
## 1                   659
## 2                   699
## 3                   499
## 4                   819
## 5                   699
## 6                   759
##  ListingNumber          Term        BorrowerRate    EstimatedReturn 
##  Min.   :      4   Min.   :12.00   Min.   :0.0000   Min.   :-0.183  
##  1st Qu.: 400919   1st Qu.:36.00   1st Qu.:0.1340   1st Qu.: 0.074  
##  Median : 600554   Median :36.00   Median :0.1840   Median : 0.092  
##  Mean   : 627886   Mean   :40.83   Mean   :0.1928   Mean   : 0.096  
##  3rd Qu.: 892634   3rd Qu.:36.00   3rd Qu.:0.2500   3rd Qu.: 0.117  
##  Max.   :1255725   Max.   :60.00   Max.   :0.4975   Max.   : 0.284  
##                                                     NA's   :29084   
##   ProsperScore        EmploymentStatus IsBorrowerHomeowner
##  Min.   : 1.00   Employed     :67322   Mode :logical      
##  1st Qu.: 4.00   Full-time    :26355   FALSE:56459        
##  Median : 6.00   Self-employed: 6134   TRUE :57478        
##  Mean   : 5.95   Not available: 5347   NA's :0            
##  3rd Qu.: 8.00   Other        : 3806                      
##  Max.   :11.00                : 2255                      
##  NA's   :29084   (Other)      : 2718                      
##  CurrentCreditLines TotalInquiries    CurrentDelinquencies
##  Min.   : 0.00      Min.   :  0.000   Min.   : 0.0000     
##  1st Qu.: 7.00      1st Qu.:  2.000   1st Qu.: 0.0000     
##  Median :10.00      Median :  4.000   Median : 0.0000     
##  Mean   :10.32      Mean   :  5.584   Mean   : 0.5921     
##  3rd Qu.:13.00      3rd Qu.:  7.000   3rd Qu.: 0.0000     
##  Max.   :59.00      Max.   :379.000   Max.   :83.0000     
##  NA's   :7604       NA's   :1159      NA's   :697         
##  AmountDelinquent   BankcardUtilization DebtToIncomeRatio
##  Min.   :     0.0   Min.   :0.000       Min.   : 0.000   
##  1st Qu.:     0.0   1st Qu.:0.310       1st Qu.: 0.140   
##  Median :     0.0   Median :0.600       Median : 0.220   
##  Mean   :   984.5   Mean   :0.561       Mean   : 0.276   
##  3rd Qu.:     0.0   3rd Qu.:0.840       3rd Qu.: 0.320   
##  Max.   :463881.0   Max.   :5.950       Max.   :10.010   
##  NA's   :7622       NA's   :7604        NA's   :8554     
##  StatedMonthlyIncome LoanOriginalAmount LoanOriginationQuarter
##  Min.   :      0     Min.   : 1000      Q4 2013:14450         
##  1st Qu.:   3200     1st Qu.: 4000      Q1 2014:12172         
##  Median :   4667     Median : 6500      Q3 2013: 9180         
##  Mean   :   5608     Mean   : 8337      Q2 2013: 7099         
##  3rd Qu.:   6825     3rd Qu.:12000      Q3 2012: 5632         
##  Max.   :1750003     Max.   :35000      Q2 2012: 5061         
##                                         (Other):60343         
##    Investors       CreditScoreRangeLower CreditScoreRangeUpper
##  Min.   :   1.00   Min.   :  0.0         Min.   : 19.0        
##  1st Qu.:   2.00   1st Qu.:660.0         1st Qu.:679.0        
##  Median :  44.00   Median :680.0         Median :699.0        
##  Mean   :  80.48   Mean   :685.6         Mean   :704.6        
##  3rd Qu.: 115.00   3rd Qu.:720.0         3rd Qu.:739.0        
##  Max.   :1189.00   Max.   :880.0         Max.   :899.0        
##                    NA's   :591           NA's   :591
##          ListingNumber                   Term           BorrowerRate 
##                 113066                      3                   2294 
##        EstimatedReturn           ProsperScore       EmploymentStatus 
##                   1477                     12                      9 
##    IsBorrowerHomeowner     CurrentCreditLines         TotalInquiries 
##                      2                     55                    103 
##   CurrentDelinquencies       AmountDelinquent    BankcardUtilization 
##                     48                   7060                    202 
##      DebtToIncomeRatio    StatedMonthlyIncome     LoanOriginalAmount 
##                   1208                  13502                   2468 
## LoanOriginationQuarter              Investors  CreditScoreRangeLower 
##                     33                    751                     27 
##  CreditScoreRangeUpper 
##                     27
##          ListingNumber                   Term           BorrowerRate 
##                      0                      0                      0 
##        EstimatedReturn           ProsperScore       EmploymentStatus 
##                  29084                  29084                      0 
##    IsBorrowerHomeowner     CurrentCreditLines         TotalInquiries 
##                      0                   7604                   1159 
##   CurrentDelinquencies       AmountDelinquent    BankcardUtilization 
##                    697                   7622                   7604 
##      DebtToIncomeRatio    StatedMonthlyIncome     LoanOriginalAmount 
##                   8554                      0                      0 
## LoanOriginationQuarter              Investors  CreditScoreRangeLower 
##                      0                      0                    591 
##  CreditScoreRangeUpper 
##                    591

Univariate Plots Section

## 
##    12    36    60 
##  1614 87778 24545

Most Prosper loans have a term of 36 months, followed by 60 months. Very few loans have a 12-month term.

## 
##  Cancelled Chargedoff  Completed    Current   Past Due 
##          5      11992      38074      56576       7290

The first graph shows the number of Prosper loans by category of loan status. As there are multiple ‘Past Due’ columns that have low counts, we group them into one field as shown in the second graph. Naturally, most cases are current or completed. There are about 7,290 past-due loan cases, which constitutes about 6.4% of all loans with Prosper.

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
##      0.0      0.0      0.0    984.5      0.0 463900.0     7622

## 
##     0 
## 89818
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       1     246    1067    6345    5061  463900
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   2.391   3.028   3.040   3.704   5.666

Almost 80% of borrowers, also 89,818 out of 113,937 don’t have delinquency. 7,622 of the rest are unavailable. Taking the log of AmountDelinquent, the plot shows a normal distribution with log10 value of 3 being the peak, which is a few thousand dollars on average.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    0.00    7.00   10.00   10.32   13.00   59.00    7604

The distribution of current credit lines skewed right, peaking around 7 to 10.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    4000    6500    8337   12000   35000

The loan original amount is right skewed, having multiple peaks of $3,000, $10,000, and $15,000. Most loans are below the amount of $12,000.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.1340  0.1840  0.1928  0.2500  0.4975

The borrower rate, also the interest rate of consumers peaks at about 3.2% with more than 5,500 cases. The range goes from 5% to 36% for most consumers. And the average value of borrower rate is 19.3%.

## 
##     1     2     3     4     5     6     7     8     9    10    11 
##   992  5766  7642 12595  9813 12278 10597 12053  6911  4750  1456

Prosper rate is a custom risk score built using historical Prosper data. The score ranges from 1-10, with 10 being the best, or lowest risk score’. There are some score outliers that are larger than the maximum score 10. Other than that, the Prosper score ‘prefers’ to take even values, 4, 6 and 8.

## 
##     0     1     2     3     4     5     6     7     8     9    10    11 
## 16965 58308  7433  7189  2395   756  2572 10494   199    85    91   217 
##    12    13    14    15    16    17    18    19    20 
##    59  1996   876  1522   304    52   885   768   771

This is an interesting histogram that shows the purpose of a personal loan. Code number 1 (Debt Consolidation) has almost 60K records surpassing all the other listing catogories.

The income range follows a normal distribution, with most borrowers having $25000~$49000 and $50000~$74999. Interestingly some borrowers have $0 income. Also there doesn’t exist a ‘higher than $100000’ bucket, which is not uncommon in high-tech regions such as California.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   0.140   0.220   0.276   0.320  10.010    8554

As a requirement of Prosper loan application, the debt to income ratio should be less than 0.5 to qualify. Most of the loan cases in the dataset satisfies this rule with some outliers. In most cases, the debt to income ratio for consumers ranges from 0.1 to 0.3. This is a useful indicator for people who are interested in applying for a Prosper loan.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     0.0   660.0   680.0   685.6   720.0   880.0     591
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     0.0   660.0   680.0   685.6   720.0   880.0     591

The plot dipicts the lower range and upper range of credit scores of borrowers. Lower is shown is red while upper in blue.

The distribution of credit scores of borrowers is close to normal distribution, peaking around 680 to 700.

## 
## 2005 Q4 2006 Q1 2006 Q2 2006 Q3 2006 Q4 2007 Q1 2007 Q2 2007 Q3 2007 Q4 
##      22     315    1254    1934    2403    3079    3118    2671    2592 
## 2008 Q1 2008 Q2 2008 Q3 2008 Q4 2009 Q2 2009 Q3 2009 Q4 2010 Q1 2010 Q2 
##    3074    4344    3602     532      13     585    1449    1243    1539 
## 2010 Q3 2010 Q4 2011 Q1 2011 Q2 2011 Q3 2011 Q4 2012 Q1 2012 Q2 2012 Q3 
##    1270    1600    1744    2478    3093    3913    4435    5061    5632 
## 2012 Q4 2013 Q1 2013 Q2 2013 Q3 2013 Q4 2014 Q1 
##    4425    3616    7099    9180   14450   12172

The first plot is not in chronological order. After a rearrange of year and quarter, we have the second time series graph that depicts ups and downs in number of loan listings the company has. Prosper was founded in year 2005, which explains the low count in the fourth quarter of 2005. The business gradually grew in the next few years, then had a very sharp decline in 2008 because of the global financial crisis. The recovery took some time when the number of loans climbed up in 2009 and 2010. By the end of 2011, the company regained the history-high record. Overall, the growth was most impressive in year 2013 when the loan amount rocketed to almost 15,000 in the fourth quarter.

Univariate Analysis

What is the structure of your dataset?

There are 113,937 observations of 81 variables in the Prospect Loan dataset. In this analysis we are going to focus on fields - Term, BorrowerRate, ProsperScore, EmploymentStatus, IsBorrowerHomeowner, CurrentCreditLines, CurrentDelinquencies, AmountDelinquent, DebtToIncomeRatio,StatedMonthlyIncome,LoanOriginalAmount, LoanOriginationQuarter, CreditScoreRangeLower and CreditScoreRangeUpper etc.

A few definitions of the above fields are vital to our analysis

Term The length of the loan expressed in months.

BorrowerRate The Borrower’s interest rate for this loan.

ProsperScore A custom risk score built using historical Prosper data. The score ranges from 1-10, with 10 being the best, or lowest risk score.
Applicable for loans originated after July 2009. (Note that there are 29084 missing values.)

CreditScoreRangeLower/Upper The lower/upper value representing the range of the borrower’s credit score as provided by a consumer credit rating agency.

AmountDelinquent Dollars delinquent at the time the credit profile was pulled.

DebtToIncomeRatio The debt to income ratio of the borrower at the time the credit profile was pulled. This value is Null if the debt to income ratio is not available. This value is capped at 10.01 (any debt to income ratio larger than 1000% will be returned as 1001%).

CurrentCreditLines Number of current credit lines at the time the credit profile was pulled.

ListingCategory The category of the listing that the borrower selected when posting their listing: 0 - Not Available, 1 - Debt Consolidation, 2 - Home Improvement, 3 - Business, 4 - Personal Loan, 5 - Student Use, 6 - Auto, 7- Other, 8 - Baby&Adoption, 9 - Boat, 10 - Cosmetic Procedure, 11 - Engagement Ring, 12 - Green Loans, 13 - Household Expenses, 14 - Large Purchases, 15 - Medical/Dental, 16 - Motorcycle, 17 - RV, 18 - Taxes, 19 - Vacation, 20 - Wedding Loans

LoanOriginalAmount The origination amount of the loan.

Some observations from the single variable exploration

  1. The term of loan are mostly 36 and 60 months, in other words 3 or 5 years.

  2. Almost half of the records are current loans. 6.4% are past due.

  3. The IQR of the borrower rate is 14% ~ 32%, with 32% being the most popular rate of all loans.

  4. Prosper score follows a pseudo normal distribution in even and odd sets of scores respectively. It’s interesting that there are overall more even scores than odd scores in the middle range 3~8.

  5. The credit score median is normal distributed, peaking around 680 - 700.

  6. Almost 80% of borrowers don’t have any delinquency.

  7. The borrowers’ credit Lines is right skewed, peaking around 7 to 10.

  8. 75 Quantile of loan amount is $12,000. There are several peaks at $3,000, $10,000, and $15,000.

  9. More than half of the listings are created for the purpuse of Debt Consolidation, followed by Home Improvement, Business and Auto.

  10. Most borrowers have an income of $25000 to 75000 annually.

  11. The number of loans at Prosper follows a chronological pattern.

What is/are the main feature(s) of interest in your dataset?

The most important features in the dataset are BorrowerRate and ProsperScore. For a consumer seeking for a peer-to-peer personal loan, the interest rate will be the key factor that one considers when shopping around. I am interested in finding the relationship between BorrowerRate and ProsperScore, which is solely based on Prosper history of the consumer who is applying. For that reason, there are other factors from the credit history perspective that also contributes to the BorrowerRate, such as CreditScoreRangeLower/Upper.

What other features in the dataset do you think will help support your
investigation into your feature(s) of interest?

The term, consumer’s credit history, time of application and monthly income will help support my investigation.

Did you create any new variables from existing variables in the dataset?

I regrouped LoanStatus ‘Past Due’ section into one, because each sub-category of ‘Past Due’ has too few records compared to other categories. The final histogram of 5 main categories is more meaningful to the audience.

Moreover, I rearranged the LoanOriginationQuarter variable so that the value of the new catogories are chronological, which provides convenience and clarity in the time series histogram. We could see how Prosper’s loan business fluctuate over time.

Of the features you investigated, were there any unusual distributions?
Did you perform any operations on the data to tidy, adjust, or change the form
of the data? If so, why did you do this?

The ProsperScore is supposed to range from 0 to 10, but I found some outliers valued at 11. These could be bador old data(earlier than July 2009), so I excluded them from the histogram by setting the limits on x axis. In this way, we have a clean vision of the distribution of ProsperScore.

Bivariate Plots Section

At the beginning of this section, I create a right subset function that scans the string from the right side. Also I create two new variables. The first is the CreditScoreMedian calculated from CreditScoreRangeLower and Upper. The second is CreditHistory calculated from FirstRecordedCreditLine. CreditHistory is on the level of years.

The above ggpairs graph only reflect the 56,576 current loans with Prosper. It still provides a full insight on the most up-to-date loan information that assists future analysis between two and multiple variables. As we can see, the ggcorr graph provides a clear illustration on variable correlation. Pairs in dark red and blue will be focused on in the follow-up analysis.

Prosper interest rate is lower when a borrower has higher ProsperScore, proving the strong linear correlation of -0.7.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0    3667    5167    6153    7447 1750000

The stated monthly income axis is tailored as the majority of monthly income are well below $20,000. From the second scatterplot, there doesn’t exist a strong correlation between income and borrower rate.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0577  0.1314  0.1760  0.1838  0.2310  0.3304

The credit score axis is adjusted to normal credit range by common sense. It looks from the graph that CreditScore and BorrowerRate are negatively correlated at level of -0.5 by the Pearson correlation test.

The boxplot shows the relationship between prosper score and employment status. Full-time consumers have the highest IQR and mean of prosper score, while Self-employed category have the lowest. As we can observe, the distribution of full-time category is skewed, since people may have all kinds of occupations whose income and credit habits vary a lot. It is surprising that people who are ‘Retired’ have higher prosper score than Employed and Self-employed categories. I wonder what the differenc is between category Employed and Full-time.

How active is each type of employment in credit cards and loans activities? The first boxplot tells us that the part-time group is the least active in having loans and openning credit cards. The Employed group have the most outliers, followed by Full-time and Self-employed. It is understandable that financial institutions are more willing to open credit lines for small business or consumers who have a stable income.

Zooming in to credit lines of 0~25, it’s clear that Employed consumers have the highest credit lines open of about 11, while Not-employed and Part-time have the lowest average of about 7.

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
##      0.0      0.0      0.0    984.5      0.0 463900.0     7622

This jitter plot graph shows that consumers who have are homeowners have a longer tailed delinquent distribution, thus slightly higher delinquency amount if exists than those who are mortgage care-free.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   0.140   0.220   0.276   0.320  10.010    8554

Prosper requires a consumer to have debt to income ratio of less than 50% to qualify for the loan application. From the first graph there are lots of outliers above 50%, even up to 1001% where the value is capped. Zooming in, we can see there is a full range of debt to income ratio from 0 to 60% and 70% for consumers with a credit history of 10 to 40 years. The peak is at 20~25 years of credit history. Let’s say a consumer has the first credit card at age 20. Adding 20 years makes him 40 plus, which is about the age most people have a home and kids to supply. It makes sense these consumers have more debts at the peak of their financial burden and responsibilities.

Next we want to build a simple linear regression model to predict the borrower rate, using prosper score as the only predictor.

## 
## Calls:
## m1: lm(formula = I(BorrowerRate) ~ I(ProsperScore), data = subset(loan.lm1, 
##     !is.na(ProsperScore)))
## 
## ===================================
##   (Intercept)            0.317***  
##                         (0.001)    
##   I(ProsperScore)       -0.020***  
##                         (0.000)    
## -----------------------------------
##   R-squared              0.422     
##   adj. R-squared         0.422     
##   sigma                  0.057     
##   F                  61989.938     
##   p                      0.000     
##   Log-likelihood    123078.549     
##   Deviance             273.094     
##   AIC              -246151.098     
##   BIC              -246123.052     
##   N                  84853         
## ===================================
## 
##  Pearson's product-moment correlation
## 
## data:  BorrowerRate and ProsperScore
## t = -248.98, df = 84851, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.6536072 -0.6458311
## sample estimates:
##        cor 
## -0.6497361

We can observe a negative relationship between the borrower rate and prosper score. The Pearson correlation test also indicates a negative linear relationship between the two variables(correlation = -0.65).

As for the linear model, the explanatory variable ProsperScore has a p value much less thatn 0.05, and it accounts for 42.2% of the variance in BorrowerRate. The parameter of prosper score is -0.02, meaning that as prosper score increases by one unit, the borrower rate decreases by 0.02, which is 2%.

In the next section we are going to include more credit-related and loan-related variables that could explain BorrowerRate.

Bivariate Analysis

Talk about some of the relationships you observed in this part of the
investigation. How did the feature(s) of interest vary with other features in
the dataset?

First, prosper score has a negative positive relationship with interest rate. The higher the prosper score, the lower(the better) the interest rate.

Second, there doesn’t exist a noticable linear correlation between stated monthly income and borrower rate. As we see the pearson correlation coefficient is as low as 0.1.

Additionally, full-time and employed consumers have the highest average prosper score and also the highest average number of open credit lines. Self-employed consumers have high number of credit lines, most likely for business reasons.

Did you observe any interesting relationships between the other features
(not the main feature(s) of interest)?

A borrower who already has mortgage tend to have more amount delinquency as the pressure of mortgage and household must be high.

Borrowers having 20~30 years of credit history have the highest debt to income ratio, with some outliers rocketing up to more than 1000%. The ratio becomes much lower for people who have 40 years of credit history, making them about 60 years old.

What was the strongest relationship you found?

The debt to income ratio of consumers is very concentrated around the area of credit history ranging 10 to 35. A negative relationship can be observved from the plotting as well. As a requirement to apply for personal loans at Prosper, a borrower need to have minimum two years of credit history, and the reported average value is 11, which are both verified from the graph. As these two variables are vital criteria in loan application, we are going to investigate them more in the next modeling section.

Multivariate Plots Section

## # A tibble: 10 x 4
##    BorrowerState loan_count loan_mean BorrowerStateFull
##    <fct>              <int>     <dbl> <chr>            
##  1 CA                 14717     8974. california       
##  2 TX                  6842     9088. texas            
##  3 NY                  6729     8833. new york         
##  4 FL                  6720     8207. florida          
##  5 IL                  5921     8396. illinois         
##  6 ""                  5515     5412. <NA>             
##  7 GA                  5008     8363. georgia          
##  8 OH                  4197     8078. ohio             
##  9 MI                  3593     7645. michigan         
## 10 VA                  3278     8971. virginia

Grouped the loan data from listing level to state level, calculating the number of loan for each state as loan_count and average loan amount as loan_mean from LoanOriginalAmount.

The printed data frame is ordered by decending loan_count. The top 5 states having the highest number of loan counts are California, Texas, New York, Florida and Illinois.

Below is a geographical heatmap of the number of Prosper loans in the United States in Prosper history, which presents the same information as the above data.

## # A tibble: 6 x 4
##   BorrowerState loan_count loan_mean BorrowerStateFull   
##   <fct>              <int>     <dbl> <chr>               
## 1 DC                   382    10126. district of columbia
## 2 NJ                  3097     9529. new jersey          
## 3 MA                  2242     9515. massachusetts       
## 4 NH                   551     9463. new hampshire       
## 5 AK                   200     9312. alaska              
## 6 MD                  2821     9250. maryland

Additionally, the second heat map dipicts the average loan_mean for each state. It’s shown that although the states on the east coast have low loan counts, their loan amount is very high. This interesting combination is probably due to the distance from the Prosper hub on the west coast but also the fast-growing economy on the east coast.

Next is BorrowerRate over loan origination year in an effort to check if there is an trend in the interest rate of borrowers over time.

We only observe the top 6 states with the highest loan_count here. Variable LoanOrigYear is created from LoanOriginationQuarter by only capturing the year the loan was created.

This faceted boxplot provides an amazing illustration on how the BorrowerRate fluctuates over time, in other words, in the general economy background.

The financial crisis really hit the economy and the personal loan market in 2008 and 2009, forcing vendors like Prosper lower their interest rate to attract borrowers. From the perspective of the states, Texas and New York didn’t do very well in particular during the crisis. In comparison, California, Gorgia and Illinois suffered less hit.

The loan business came back to live in 2011 and 2012 as the economy started to revitalize. We can observe a very good performance for New York and Texas.

Similarly, the jitter point graph also indicates much less business in 2008 and fast growth in 2011 and 2012 as the density of colors change. In particular, we see long term loans as shown in light blue starting to boom in 2011. This also relates back to the previous boxplot that in 2011 and 2012 the average of BorrowerRate was high.

Multivariate linear model - Iteration one

Use Prosper history information as well as loan-related information as predictors for BorrowerRate.

## 
## Calls:
## m1: lm(formula = I(BorrowerRate) ~ I(ProsperScore) + I(CreditScoreMedian), 
##     data = loan)
## m2: lm(formula = I(BorrowerRate) ~ I(ProsperScore) + I(CreditScoreMedian) + 
##     Term, data = loan)
## m3: lm(formula = I(BorrowerRate) ~ I(ProsperScore) + I(CreditScoreMedian) + 
##     Term + LoanOrigYear, data = loan)
## m4: lm(formula = I(BorrowerRate) ~ I(ProsperScore) + I(CreditScoreMedian) + 
##     Term + LoanOrigYear + LoanOriginalAmount, data = loan)
## m5: lm(formula = I(BorrowerRate) ~ I(ProsperScore) + I(CreditScoreMedian) + 
##     Term + LoanOrigYear + LoanOriginalAmount + StatedMonthlyIncome, 
##     data = loan)
## 
## ========================================================================================================
##                               m1              m2              m3              m4              m5        
## --------------------------------------------------------------------------------------------------------
##   (Intercept)                 0.645***        0.639***       52.623***       48.522***       48.513***  
##                              (0.003)         (0.003)         (0.259)         (0.276)         (0.276)    
##   I(ProsperScore)            -0.017***       -0.017***       -0.019***       -0.018***       -0.018***  
##                              (0.000)         (0.000)         (0.000)         (0.000)         (0.000)    
##   I(CreditScoreMedian)       -0.000***       -0.000***       -0.000***       -0.000***       -0.000***  
##                              (0.000)         (0.000)         (0.000)         (0.000)         (0.000)    
##   Term                                        0.000***        0.001***        0.001***        0.001***  
##                                              (0.000)         (0.000)         (0.000)         (0.000)    
##   LoanOrigYear                                               -0.026***       -0.024***       -0.024***  
##                                                              (0.000)         (0.000)         (0.000)    
##   LoanOriginalAmount                                                         -0.000***       -0.000***  
##                                                                              (0.000)         (0.000)    
##   StatedMonthlyIncome                                                                         0.000***  
##                                                                                              (0.000)    
## --------------------------------------------------------------------------------------------------------
##   R-squared                   0.5057          0.5066          0.6653          0.6717          0.6718    
##   adj. R-squared              0.5057          0.5066          0.6653          0.6717          0.6717    
##   sigma                       0.0525          0.0524          0.0432          0.0428          0.0428    
##   F                       43399.7450      29044.6500      42164.2025      34720.3395      28939.1491    
##   p                           0.0000          0.0000          0.0000          0.0000          0.0000    
##   Log-likelihood         129702.0207     129784.8619     146246.3116     147066.2547     147072.0398    
##   Deviance                  233.6208        233.1651        158.1822        155.1545        155.1333    
##   AIC                   -259396.0414    -259559.7238    -292480.6232    -294118.5094    -294128.0796    
##   BIC                   -259358.6467    -259512.9804    -292424.5312    -294053.0687    -294053.2901    
##   N                       84853           84853           84853           84853           84853         
## ========================================================================================================

The six vairables accounts for 67.18% of the variation of the BorrowerRate. The p value is less than 0.05 for each predictor, so we are confident to reject the null hypothesis that coefficients are 0. Therefore, the coefficients are valid in the above chart. In other words, the 6 variables are all meaningful in prediction of the borrower interest rate at Prosper.

Multivariate linear model - Iteration two

A few adjustments are made to the second iteration of the model.

  1. Use log transformation on large values CreditScoreMedian, LoanOriginalAmount, StatedMonthlyIncome. Adjust if the value before log is less than 1.

  2. Use year as unit for Term instead of month.

  3. Change LoanOrigYear from numeric to character because the quantity doesn’t contribute to the ProsperScore by common sense.

  4. Add three credit-related predictors of interest - CreditHistory, DebtToIncomeRatio, IsBorrowerHomeowner.

## 
## Calls:
## m1: lm(formula = I(BorrowerRate) ~ I(ProsperScore) + I(log(CreditScoreMedian)), 
##     data = loan)
## m2: lm(formula = I(BorrowerRate) ~ I(ProsperScore) + I(log(CreditScoreMedian)) + 
##     I(Term/12), data = loan)
## m3: lm(formula = I(BorrowerRate) ~ I(ProsperScore) + I(log(CreditScoreMedian)) + 
##     I(Term/12) + as.factor(LoanOrigYear), data = loan)
## m4: lm(formula = I(BorrowerRate) ~ I(ProsperScore) + I(log(CreditScoreMedian)) + 
##     I(Term/12) + as.factor(LoanOrigYear) + log(LoanOriginalAmount), 
##     data = loan)
## m5: lm(formula = I(BorrowerRate) ~ I(ProsperScore) + I(log(CreditScoreMedian)) + 
##     I(Term/12) + as.factor(LoanOrigYear) + log(LoanOriginalAmount) + 
##     log(StatedMonthlyIncome + 1), data = loan)
## m6: lm(formula = I(BorrowerRate) ~ I(ProsperScore) + I(log(CreditScoreMedian)) + 
##     I(Term/12) + as.factor(LoanOrigYear) + log(LoanOriginalAmount) + 
##     log(StatedMonthlyIncome + 1) + DebtToIncomeRatio, data = loan)
## m7: lm(formula = I(BorrowerRate) ~ I(ProsperScore) + I(log(CreditScoreMedian)) + 
##     I(Term/12) + as.factor(LoanOrigYear) + log(LoanOriginalAmount) + 
##     log(StatedMonthlyIncome + 1) + DebtToIncomeRatio + CreditHistory, 
##     data = loan)
## m8: lm(formula = I(BorrowerRate) ~ I(ProsperScore) + I(log(CreditScoreMedian)) + 
##     I(Term/12) + as.factor(LoanOrigYear) + log(LoanOriginalAmount) + 
##     log(StatedMonthlyIncome + 1) + DebtToIncomeRatio + CreditHistory + 
##     IsBorrowerHomeowner, data = loan)
## 
## ======================================================================================================================================================================
##                                             m1              m2              m3              m4              m5              m6              m7              m8        
## ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
##   (Intercept)                               2.642***        2.645***        2.672***        2.574***        2.593***        2.624***        2.637***        2.647***  
##                                            (0.019)         (0.019)         (0.015)         (0.015)         (0.015)         (0.016)         (0.016)         (0.016)    
##   I(ProsperScore)                          -0.017***       -0.017***       -0.019***       -0.018***       -0.018***       -0.018***       -0.018***       -0.018***  
##                                            (0.000)         (0.000)         (0.000)         (0.000)         (0.000)         (0.000)         (0.000)         (0.000)    
##   I(log(CreditScoreMedian))                -0.358***       -0.359***       -0.359***       -0.332***       -0.333***       -0.344***       -0.346***       -0.347***  
##                                            (0.003)         (0.003)         (0.002)         (0.002)         (0.002)         (0.002)         (0.002)         (0.003)    
##   I(Term/12)                                                0.002***        0.010***        0.013***        0.013***        0.013***        0.013***        0.013***  
##                                                            (0.000)         (0.000)         (0.000)         (0.000)         (0.000)         (0.000)         (0.000)    
##   as.factor(LoanOrigYear): 2010/2009                                        0.008***        0.010***        0.010***        0.009***        0.009***        0.009***  
##                                                                            (0.001)         (0.001)         (0.001)         (0.001)         (0.001)         (0.001)    
##   as.factor(LoanOrigYear): 2011/2009                                       -0.001           0.007***        0.006***        0.004***        0.004***        0.004***  
##                                                                            (0.001)         (0.001)         (0.001)         (0.001)         (0.001)         (0.001)    
##   as.factor(LoanOrigYear): 2012/2009                                       -0.013***       -0.005***       -0.005***       -0.007***       -0.007***       -0.007***  
##                                                                            (0.001)         (0.001)         (0.001)         (0.001)         (0.001)         (0.001)    
##   as.factor(LoanOrigYear): 2013/2009                                       -0.066***       -0.053***       -0.053***       -0.054***       -0.054***       -0.054***  
##                                                                            (0.001)         (0.001)         (0.001)         (0.001)         (0.001)         (0.001)    
##   as.factor(LoanOrigYear): 2014/2009                                       -0.087***       -0.073***       -0.073***       -0.072***       -0.072***       -0.072***  
##                                                                            (0.001)         (0.001)         (0.001)         (0.001)         (0.001)         (0.001)    
##   log(LoanOriginalAmount)                                                                  -0.012***       -0.011***       -0.012***       -0.012***       -0.012***  
##                                                                                            (0.000)         (0.000)         (0.000)         (0.000)         (0.000)    
##   log(StatedMonthlyIncome + 1)                                                                             -0.002***        0.003***        0.003***        0.002***  
##                                                                                                            (0.000)         (0.000)         (0.000)         (0.000)    
##   DebtToIncomeRatio                                                                                                         0.013***        0.012***        0.012***  
##                                                                                                                            (0.001)         (0.001)         (0.001)    
##   CreditHistory                                                                                                                             0.000***        0.000***  
##                                                                                                                                            (0.000)         (0.000)    
##   IsBorrowerHomeowner                                                                                                                                       0.001*    
##                                                                                                                                                            (0.000)    
## ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
##   R-squared                                 0.5076          0.5086          0.6964          0.7053          0.7061          0.7108          0.7111          0.7112    
##   adj. R-squared                            0.5076          0.5086          0.6964          0.7052          0.7061          0.7108          0.7111          0.7111    
##   sigma                                     0.0524          0.0523          0.0411          0.0405          0.0405          0.0397          0.0397          0.0397    
##   F                                     43734.2730      29276.8924      24325.9049      22558.1288      20382.1758      17328.0006      15908.9086      14686.2339    
##   p                                         0.0000          0.0000          0.0000          0.0000          0.0000          0.0000          0.0000          0.0000    
##   Log-likelihood                       129867.0694     129956.3904     150382.5630     151641.7652     151759.6549     140104.2494     140147.8055     140150.2032    
##   Deviance                                232.7137        232.2243        143.4886        139.2925        138.9059        122.4809        122.3434        122.3358    
##   AIC                                 -259726.1389    -259902.7809    -300745.1260    -303261.5304    -303495.3099    -280182.4989    -280267.6110    -280270.4064    
##   BIC                                 -259688.7442    -259856.0375    -300651.6393    -303158.6949    -303383.1258    -280062.1349    -280137.9882    -280131.5249    
##   N                                     84853           84853           84853           84853           84853           77557           77557           77557         
## ======================================================================================================================================================================

Similar to the first iteration,the p value is less than 0.05 for each predictor, indicating that we are confident to reject the null hypothesis that coefficients are zero. This means that all credit-related additions are meaningful to the model. Moreover, there is an overall improvement in R squared from 67.18% to 71.12%. If we look closer, there is actually an improvement on each level of predictor addition, meaning that the variable transformation and addition are worthwhile.

Multivariate Analysis

Talk about some of the relationships you observed in this part of the
investigation. Were there features that strengthened each other in terms of
looking at your feature(s) of interest?

As we can see from the jitter point graph ‘Trend in interest rates for different loan terms’, the same conclusion can be drawn as the one variable qplot in the univariate plot section.

On top of that, time series observation is made when we look at the BorrowerRate over time.

Also, in terms of Loan OriginationQuarter, the set of faceted boxplots ‘Trend in interest rates in top 6 states’ has convinced the economy influence on Prosper’s loan business, as is initially observed in the univariate section as well.

Were there any interesting or surprising interactions between features?

There are two interesting observations.

On the one hand, it’s very helpful to observe the loan count as well as the loan average in the geographical graph of the United States. We know that most loans are from California where Prosper is hubbed, while the highest average loan is DC, an east-coast area. I wonder what reasons are behind that drive people to seek for large amounts of personal loans. From my perspective, California is also a high-salary state with many tech companies. Merely thinking from the economy perspective is not enough to answer this question intuitively.

On the other hand, the fluctuation of BorrowerRate over time follows similar trend as the number of loans over time. This is not surprising but very interesting to think about. More research could be initiated to look more into each State and different types of loans to see what levels of BorrowRate there are.

OPTIONAL: Did you create any models with your dataset? Discuss the
strengths and limitations of your model.

I created a multivariate model and made a modification on it.More credit-related variables are included to increase the R squared, so that up to 71.12% variation of BorrowerRate is explained. This model is strong enough,having Prosper history, loan features and credit history, although more predictors should be added to strengthen the model even more. Some other variables that could explain the BorrowerRate potentially are TotalInquiries, OpenRevolvingAccounts and PublicRecordsLast12Months.

Final Plots and Summary

Plot One

Description One

This is a univariate distribution of BorrowerRate. Refined the theme and axis ranges.The highest frequency of interest rate of a consumer’s loan with Prosper is 32%, followed by range 14%~20%.

Plot Two

Description Two

This is an upgrade on the negative relationship between interest rate and credit score, also adding a fitted line and layer of loan term. The fitted curve is close to linear, predictable in credit score range 550~850. The darker layer of dots are above the lighter layer, indicating that longer loan term usually means higher interest rate.

Plot Three

Description Three

This is a geographical heatmap of the average Prosper loan amount in the United States. California has the highest loan count overall, followed by Texas, and Florida etc. There are very few loan listings originated from states in the middlewest.

Reflection

The Prosper loan dataset is a large and resourceful one that contains 114K observations and more than 80 variables. Some variables have missing values or outliers that complicate the analysis here and there. I made a lot of efforts digging into the distribution and definition of each variable to find meaningful trend, features and relationship among data.

Luckily, I was able to apply all of the techniques I learned from the course to this project. I also researched online for more interesting graphic methods, such as the geographical heatmap. Some plots gave me a surprise, such as the Prosper collected interest and fees, while some others are insightful and meaningful, such as the loan amount and interest rate over time.

Additionally, I was able to build a predictive model on the borrower interest rate by using 9 predictors in three categories - prosper history, loan features and historical credit performance. This model has a high explanation and predictive value for any consumers who are interested in having a personal loan with Prosper.

Nevertheless, more work could be done using this dataset. I could look more into details on the cohorts of a time period, a few states, or one listing category. As for modeling, I could include more variables to increase the explanatory value of the multi-linear model, or I could fit other types of models that may have a higher predictive value.